EpiCompare compares different epigenetic datasets for benchmarking and quality control purposes. The report consists of three main sections:

  1. General Metrics: Metrics on fragments (duplication rate) and peaks (blacklisted peaks and widths)
  2. Peak Overlaps: Percentage and statistical significance of overlapping and unique peaks
  3. Functional Annotation: Functional annotation (ChromHMM, motif and enrich) of peaks.

Input peak files. Total of 4 samples:

## [1] "File1: active_macs"
## [1] "File2: active_seacr"
## [1] "File3: H3K27me3_macs"
## [1] "File4: H3K27me3_seacr"


1. General Metrics


Fragment Information

This information is displayed only if summary metrics from Picard is provided. See help manual.

  • Mapped_Fragments: Number of mapped read pairs in the file.
  • Duplication_Rate: Percentage of mapped sequence that is marked as duplicate.
  • Unique_Fragments: Number of mapped sequence that is not marked as duplicate.


Peak Information

  • Total_N: Total number of peaks including those blacklisted.
  • Blacklisted_Peaks: Percentage of blacklisted peaks present in the sample (blacklist).ENCODE blacklist includes regions of the genome that have anomalous and/or unstructured signals independent of the cell-line or experiment.


Sample PeakN before tidy Blacklisted peaks removed (%) Non-standard peaks removed (%) PeakN after tidy
active_macs 2526 23.20 7.050 1762
active_seacr 3211 17.80 4.390 2497
H3K27me3_macs 89900 2.85 0.414 86965
H3K27me3_seacr 107569 2.93 0.435 103953

Peak widths

Distribution of peak widths in each sample after removing blacklisted peaks.



2. Peak Overlaps

Individual samples

Heatmap of percentage overlaps between input peak files. Hover over the heatmap for percentage values.

Significance of overlapping vs unique peaks

The plot is displayed only if a reference peak file is provided and stat_plot = TRUE. Depending on the format of the reference file, the function output different plots:

  • If the reference file has BED6+4 format (peak called with MACS2), the plot is a paired boxplot showing a distribution of -log10(q-value) for overlapping and unique peaks per sample.
  • If the reference file does not have BED6+4 format, it generates a barplot of percentage overlap per sample, coloured by adjusted p-value.


3. Functional Annotation

3.1 ChromHMM

ChromHMM annotates different chromatin states (ChromHMM). The annotation used were obtained from here.

All samples

ChromHMM annotation of individual peak files.

Overlapping peaks

ChromHMM annotation of peaks overlapping with the reference peak file.

Unique peaks

ChromHMM annotation of peaks non-overlapping with the reference peak file.

3.2 Enrichment analysis

3.3 Motif analysis